Clustering Based Feature Learning on Variable Stars
نویسندگان
چکیده
The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on the final classification task. Today, lightcurve representation is not entirely automatic; algorithms that extract lightcurve features are designed by humans and must be manually tuned up for every survey. The vast amounts of data that will be generated in future surveys like LSST mean astronomers must develop analysis pipelines that are both scalable and automated. Recently, substantial efforts have been made in the machine learning community to develop methods that prescind from expert-designed and manually tuned features for features that are automatically learned from data. In this work we present what is, to our knowledge, the first unsupervised feature learning algorithm designed for variable objects. Our method works by extracting a large number of lightcurve subsequences from a given set of photometric data, which are then clustered to find common local patterns in the time series. Representatives of these common patterns, called exemplars, are then used to transform lightcurves of a labeled set into a new representation that can then be used to train an automatic classifier. The proposed algorithm learns the features from both labeled and unlabeled lightcurves, overcoming the bias generated when the learning process is done only with labeled lightcurves. We test our method on MACHO and OGLE datasets; the results show that the classification performance we achieve is as good and in some cases better than the performance achieved using traditional statistical features, while the computational cost is significantly lower. With these promising results, we believe that our method constitutes a significant step towards the automatization of the lightcurve classification pipeline.
منابع مشابه
Intrusion Detection based on a Novel Hybrid Learning Approach
Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...
متن کاملA Discriminative Latent Variable Model for Online Clustering
This paper presents a latent variable structured prediction model for discriminative supervised clustering of items called the Latent Left-linking Model (LM). We present an online clustering algorithm for LM based on a feature-based item similarity function. We provide a learning framework for estimating the similarity function and present a fast stochastic gradient-based learning technique. In...
متن کاملSteel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps
Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملDiagnosis of Heart Disease Based on Meta Heuristic Algorithms and Clustering Methods
Data analysis in cardiovascular diseases is difficult due to large massive of information. All of features are not impressive in the final results. So it is very important to identify more effective features. In this study, the method of feature selection with binary cuckoo optimization algorithm is implemented to reduce property. According to the results, the most appropriate classification fo...
متن کاملStock Price Prediction using Machine Learning and Swarm Intelligence
Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem. Methods: In this...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1602.08977 شماره
صفحات -
تاریخ انتشار 2016